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Abstract 

In  previous  work,  we  presented  Typed  Assembly  Language  (TAL).  TAL  is  sufficiently  expressive  to 
serve  as  a  target  language  for  compilers  of  high-level  languages  such  as  ML.  That  work  assumed 
such  a  compiler  would  perform  a  continuation-passing  style  transform  and  eliminate  the  control 
stack  by  heap-allocating  activation  records.  However,  most  compilers  are  based  on  stack  allocation. 
This  paper  presents  STAL,  an  extension  of  TAL  with  stack  constructs  and  stack  types  to  support 
the  stack  allocation  style.  We  show  that  STAL  is  sufficiently  expressive  to  support  languages  such 
as  Java,  Pascal,  and  ML;  constructs  such  as  exceptions  and  displays;  and  optimizations  such  as  tail 
call  elimination  and  callee-saves  registers.  This  paper  also  formalizes  the  typing  connection  between 
CPS-based  compilation  and  stack-based  compilation  and  illustrates  how  STAL  can  formally  model 
calling  conventions  by  specifying  them  as  formal  translations  of  source  function  types  to  STAL 
types. 

This  material  is  based  on  work  supported  in  part  by  the  AFOSR  grant  F49620-97- 1-0013,  ARPA/RADC  grant 
F30602-96- 1-0317,  ARPA/AF  grant  F30602-95- 1-0047,  AASERT  grant  N00014-95- 1-0985,  and  ARPA  grant  F19628- 
95-C-0050.  Any  opinions,  findings,  and  conclusions  or  recommendations  expressed  in  this  publication  are  those  of 
the  authors  and  do  not  reflect  the  views  of  these  agencies. 
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1  Introduction  and  Motivation 


Statically  typed  source  languages  have  efficiency  and  software  engineering  advantages  over  their 
dynamically  typed  counterparts.  Modern  type-directed  compilers  [18,  25,  7,  32,  19,  29,  11]  exploit 
the  properties  of  typed  languages  more  extensively  than  their  predecessors  by  preserving  type 
information  computed  in  the  front  end  through  a  series  of  typed  intermediate  languages.  These 
compilers  use  types  to  direct  sophisticated  transformations  such  as  closure  conversion  [17,  31,  16, 
4,  20],  region  inference  [8],  subsumption  elimination  [9,  10],  and  unboxing  [18,  22,  28].  In  many 
cases,  without  types,  these  transformations  are  less  effective  or  simply  impossible.  Furthermore, 
the  type  translation  partially  specifies  the  corresponding  term  translation  and  often  captures  the 
critical  concerns  in  an  elegant  and  succinct  fashion.  Strong  type  systems  not  only  describe  but 
also  enforce  many  important  invariants.  Consequently,  developers  of  type-based  compilers  may 
invoke  a  typechecker  after  each  code  transformation,  and  if  the  output  fails  to  type-check,  the 
developer  knows  that  the  compiler  contains  an  internal  error.  Although  typecheckers  for  decidable 
type  systems  cannot  catch  all  compiler  errors,  they  have  proven  themselves  valuable  debugging 
tools  in  practice  [21]. 

Despite  the  numerous  advantages  of  compiling  with  types,  until  recently,  no  compiler  propagated 
type  information  through  the  final  stages  of  code  generation.  The  TIL/ML  compiler,  for  instance, 
preserves  types  through  approximately  80%  of  compilation  but  leaves  the  remaining  20%  untyped. 
Many  of  the  complex  tasks  of  code  generation  including  register  allocation  and  instruction  schedul¬ 
ing  are  left  unchecked  and  types  cannot  be  used  to  specify  or  explain  these  low-level  code  transfor¬ 
mations. 

These  observations  motivated  our  exploration  of  very  low-level  type  systems  and  corresponding 
compiler  technology.  In  Morrisett  et  al.  [23],  we  presented  a  typed  assembly  language  (TAL)  and 
proved  that  its  type  system  was  sound  with  respect  to  an  operational  semantics.  We  demonstrated 
the  expressiveness  of  this  type  system  by  sketching  a  type-preserving  compiler  from  an  ML-like 
language  to  TAL.  The  compiler  ensured  that  well-typed  source  programs  were  always  mapped  to 
well-typed  assembly  language  programs  and  that  they  preserved  source  level  abstractions  such  as 
user-defined  abstract  data  types  and  closures.  Furthermore,  we  claimed  that  the  type  system  of  TAL 
did  not  interfere  with  many  traditional  compiler  optimizations  including  inlining,  loop-unrolling, 
register  allocation,  instruction  selection,  and  instruction  scheduling. 

However,  the  compiler  we  presented  was  critically  based  on  a  continuation-passing  style  (CPS) 
transform,  which  eliminated  the  need  for  a  control  stack.  In  particular,  activation  records  were 
represented  by  heap-allocated  closures  as  in  the  SML  of  New  Jersey  compiler  (SML/NJ)  [5,  3].  For 
example.  Figure  2  shows  the  TAL  code  our  heap-based  compiler  would  produce  for  the  recursive 
factorial  computation.  Each  function  takes  an  additional  argument  which  represents  the  control 
stack  as  a  continuation  closure.  Instead  of  “returning”  to  the  caller,  a  function  invokes  its  continu¬ 
ation  closure  by  jumping  directly  to  the  code  of  the  closure,  passing  the  environment  of  the  closure 
and  the  result  in  registers. 

Allocating  continuation  closures  on  the  heap  has  many  advantages  over  a  conventional  stack-based 
implementation.  First,  it  is  straightforward  to  implement  control  primitives  such  as  exceptions, 
first-class  continuations,  or  user-level  lightweight  coroutine  threads  when  continuations  are  heap 
allocated  [3,  31,  34].  Second,  Appel  and  Shao  [2]  have  shown  that  heap  allocation  of  closures  can 
have  better  space  properties,  primarily  because  it  is  easier  to  share  environments.  Third,  there 
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is  a  unified  memory  management  mechanism  (namely  the  garbage  collector)  for  allocating  and 
collecting  all  kinds  of  objects,  including  stack  frames.  Finally,  Appel  and  Shao  [2]  have  argued 
that,  at  least  for  SML/NJ,  the  locality  lost  by  heap-allocating  stack  frames  is  negligible. 

Nevertheless,  there  are  also  compelling  reasons  for  providing  support  for  stacks.  First,  Appel 
and  Shao’s  work  did  not  consider  imperative  languages,  such  as  Java,  where  the  ability  to  share 
environments  is  greatly  reduced  nor  did  it  consider  languages  that  do  not  require  garbage  collection. 
Second,  Tarditi  and  Diwan  [13,  12]  have  shown  that  with  some  cache  architectures,  heap  allocation 
of  continuations  (as  in  SML/NJ)  can  have  substantial  overhead  due  to  a  loss  of  locality.  Third, 
stack-based  activation  records  can  have  a  smaller  memory  footprint  than  heap-based  activation 
records.  Finally,  many  machine  architectures  have  hardware  mechanisms  that  expect  programs 
to  behave  in  a  stack-like  fashion.  For  example,  the  Pentium  Pro  processor  has  an  internal  stack 
that  it  uses  to  predict  return  addresses  for  procedures  so  that  instruction  pre-fetching  will  not  be 
stalled  [15].  The  internal  stack  is  guided  by  the  use  of  call/return  primitives  which  use  the  standard 
control  stack. 

Clearly,  compiler  writers  must  weigh  a  complex  set  of  factors  before  choosing  stack  allocation, 
heap  allocation,  or  both.  The  target  language  should  not  constrain  those  design  decisions.  In  this 
paper,  we  explore  the  addition  of  a  stack  to  our  typed  assembly  language  in  order  to  give  compiler 
writers  the  flexibility  they  need.  Our  stack  typing  discipline  is  remarkably  simple,  but  powerful 
enough  to  compile  languages  such  as  Pascal,  Java,  or  ML  without  adding  high-level  primitives 
to  the  assembly  language.  More  specifically,  the  typing  discipline  supports  stack  allocation  of 
temporary  variables  and  values  that  do  not  escape,  stack  allocation  of  procedure  activation  frames, 
exception  handlers,  and  displays,  as  well  as  optimizations  such  as  callee-saves  registers.  Unlike 
the  JVM  architecture  [19],  our  system  does  not  constrain  the  stack  to  have  the  same  size  at  each 
control-flow  point,  nor  does  it  require  new  high-level  primitives  for  procedure  call/return.  Instead, 
our  assembly  language  continues  to  have  low-level  RISC-like  primitives  such  as  loads,  stores,  and 
jumps.  However,  source-level  stack  allocation,  general  source-level  stack  pointers,  general  pointers 
into  either  the  stack  or  heap,  and  some  advanced  optimizations  cannot  be  typed. 

A  key  contribution  of  the  type  structure  is  that  it  provides  a  unifying  declarative  framework  for 
specifying  procedure  calling  conventions  regardless  of  the  allocation  strategy.  In  addition,  the 
framework  further  elucidates  the  connection!  between  a  heap-based  continuation-passing  style  com¬ 
piler,  and  a  conventional  stack-based  compiler.  In  particular,  this  type  structure  makes  explicit  the 
notion  that  the  only  differences  between  the  two  styles  are  that,  instead  of  passing  the  continuation 
as  a  boxed,  heap- allocated  tuple,  a  stack-based  compiler  passes  the  continuation  unboxed  in  reg¬ 
isters  and  the  environments  for  continuations  are  allocated  on  the  stack.  The  general  framework 
makes  it  easy  to  transfer  transformations  developed  for  one  style  to  the  other.  For  instance,  we 
can  easily  explain  the  callee-saves  registers  of  SML/NJ  [5,  3,  1]  and  the  callee-sawes  registers  of  a 
stack-based  compiler  as  instances  of  a  more  general  CPS  transformation  that  is  independent  of  the 
continuation  representation. 


2  Overview  of  TAL  and  CPS-Based  Compilation 


We  begin  with  an  overview  of  our  original  typed  assembly  language  in  the  absence  of  stacks, 
and  sketch  how  a  polymorphic  functional  language,  such  as  ML,  can  be  compiled  to  TAL  in  a 
continuation-passing  style  where  continuations  are  heap- allocated. 
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types 

T 

:= 

a  1  int  1  V[A].r  1  (rf , . . . ,  r^")  |  3a.r 

initialization  flags 

:=z 

0|1 

label  assignments 

$ 

:= 

"{^1  '^1 )  •  •  •  5 

type  assignments 

A 

:= 

•  1  a,  A 

register  assignments 

r 

{ri:ri,...,r„:r„} 

registers 

r 

:= 

rl|r2|--- 

^word  values 

w 

;= 

€  1  7  1  ?r  1  t/7[r]  1  pack  [r, «;]  as  t' 

small  values 

V 

r  1  u;  1  u[r]  |  pack  [r,  u]  as  r' 

heap  values 

h 

:= 

(tci, . . . ,  Wn)  1  code[A]r.7 

heaps 

H  : 

:= 

hi, , £n  hji 

register  files 

R  : 

:= 

{ri  u;i,...,r„  i->  w^} 

instructions 

i  : 

:= 

aop  ri,rs,v  \  bop  r,  v  |  Id  r^,  rs(?)  |  malloc  r[f] 
mov  Vd,  V  1  St  rd{i),rs  \  unpack  [a,  r^j,  v  \ 

arithmetic  ops 

aop  : 

= 

add  1  sub  |  mul 

branch  ops 

bop  : 

beq  1  bneq  |  bgt  |  bit  |  bgte  |  bite 

instruction  sequences 

I  : 

= 

i;  I  1  jmp  V  1  halt[r] 

programs 

P  : 

= 

Figure  1:  Syntax  of  TAL 


Figure  1  gives  the  syntax  for  TAL.  A  TAL  program  (F)  is  a  triple  consisting  of  a  heap,  a  register 
file,  and  an  instruction  sequence.  A  register  file  is  a  mapping  of  registers  to  word-sized  values.  A 
heap  is  a  mapping  of  labels  to  heap  values  (values  larger  than  a  word),  which  are  tuples  and  code 
sequences. 

The  instruction  set  consists  mostly  of  conventional  RISC-style  assembly  operations,  including  arith¬ 
metic,  branches,  loads,  and  stores.  One  exception,  the  unpack  [cr,  r],  v  instruction,  unpacks  a  value 
V  having  existential  type,  binding  a  to  its  hidden  type  in  the  instructions  that  follow,  and  placing 
the  underlying  value  in  register  r.  On  an  untyped  machine,  where  the  moving  of  types  is  immate¬ 
rial,  this  can  be  implemented  by  a  simple  move  instruction.  The  other  non-standard  instruction  is 
malloc,  which  allocates  memory  in  the  heap.  On  a  conventional  machine,  this  instruction  would  be 
replaced  by  the  appropriate  code  to  allocate  memory.  Evaluation  of  TAL  programs  is  specified  as 
a  deterministic  small-step  operational  semantics  that  maps  programs  to  programs  (details  appear 
in  Morrisett  et  al.  [23]). 

The  unusual  types  in  TAL  are  for  tuples  and  code  blocks.  Tuple  types  contain  initialization  flags 
(either  0  or  1)  that  indicate  whether  or  not  components  have  been  initialized.  For  example,  if 
register  r  has  type  mF),  then  it  contains  a  label  bound  in  the  heap  to  a  pair  that  can  contain 
integers,  where  the  first  component  may  not  have  been  initialized,  but  the  second  component  has. 
In  this  context,  the  type  system  allows  the  second  component  to  be  loaded,  but  not  the  first.  If  an 
integer  value  is  stored  into  r(0)  then  afterwards  r  has  the  type  {int^,  int^),  reflecting  the  fact  that 
the  first  component  is  now  initialized.  The  instruction  malloc  r[ri,...,r„]  heap-allocates  a  new 
tuple  with  uninitialized  fields  and  places  its  label  in  register  r. 
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Code  types  (V[q'i,  . . . ,  a^xj.F)  describe  code  blocks  (code[ai, . .  .,an]r./),  which  are  made  from  in¬ 
struction  sequences  I  that  expect  a  register  file  of  type  F.  In  other  words,  F  serves  as  a  register 
file  pre-condition  that  must  hold  before  control  may  be  transferred  to  the  code  block.  Code  blocks 
have  no  post-condition  because  control  is  either  terminated  via  a  halt  instruction  or  transferred 
to  another  code  block.  The  type  variables  ai, . . . ,  are  bound  (and  abstract)  in  F  and  /,  and  are 
instantiated  at  the  call  site  to  the  function.  As  usual,  we  consider  alpha-equivalent  expressions  to 
be  identical;  however,  register  names  are  not  bound  variables  and  do  not  alpha-vary.  We  also  con¬ 
sider  label  assignments,  register  assignments,  heaps,  and  register  files  equivalent  when  they  differ 
only  in  the  orderings  of  their  fields.  When  A  is  empty,  we  often  abbreviate  V[A].F  as  simply  F. 

The  type  variables  that  are  abstracted  in  a  code  block  provide  a  means  to  write  polymorphic  code 
sequences.  For  example,  the  polymorphic  code  block 

code[a']{rl:a',  r2:V[].{rl:(a'\ 
malloc  r3[a',  a] 

St  r3(0),rl 

St  r3(l),rl 

mov  rl,r3 

jmp  r2 

roughly  corresponds  to  a  CPS  version  of  the  ML  function  fn  (x:a)  =>  (x,  x).  The  block  expects  upon 
entry  that  register  rl  contains  a  value  of  the  abstract  type  a,  and  r2  contains  a  return  address  (or 
continuation  label)  of  type  V[].{rl:(a^,  ct^)}.  In  other  words,  the  return  address  requires  register  rl 
to  contain  an  initialized  pair  of  values  of  type  a  before  control  can  be  returned  to  this  address.  The 
instructions  of  the  code  block  allocate  a  tuple,  store  into  the  tuple  two  copies  of  the  value  in  rl, 
move  the  pointer  to  the  tuple  into  rl  and  then  jump  to  the  return  address  in  order  to  “return”  the 
tuple  to  the  caller.  If  the  code  block  is  bound  to  a  label  then  it  may  be  invoked  by  simultaneously 
instantiating  the  type  variable  and  jumping  to  the  label  (c.^r.,  jmp  £[int]). 

Source  languages  like  ML  have  nested  higher-order  functions  that  might  contain  free  variables  and 
thus  require  closures  to  represent  functions.  At  the  TAL  level,  we  represent  closures  as  a  pair 
consisting  of  a  code  block  label  and  a  pointer  to  an  environment  data  structure.  The  type  of  the 
environment  must  be  held  abstract  in  order  to  avoid  typing  difficulties  [20],  and  thus  we  pack  the 
type  of  the  environment  and  the  pair  to  form  an  existential  type. 

All  functions,  including  continuation  functions  introduced  during  CPS  conversion,  are  thus  repre¬ 
sented  as  existentials.  For  example,  once  CPS  converted,  a  source  function  of  type  int  ()  has 
type  (m<,  (()  void))  ^  void}  Then,  after  closures  are  introduced,  the  code  has  type: 

3Q'i.((ai,  int,  3a2.{{a2,  ())  void,a2))  void,ai) 

Finally,  at  the  TAL  level  the  function  will  be  represented  by  a  value  with  the  type: 

3ai  .(V[].{rl:0'i,  r2:m<,  r3:3a'2-(V[].{rl:o;2,  0^)}^  a}) 

Here,  ai  is  the  abstracted  type  of  the  closure’s  environment.  The  code  for  the  closure  requires  that 
the  environment  be  passed  in  register  rl,  the  integer  argument  in  r2,  and  the  continuation  in  r3. 
The  continuation  is  itself  a  closure  where  02  is  the  abstracted  type  of  its  environment.  The  code 

*The  void  return  types  are  intended  to  suggest  the  non-retuming  aspect  of  CPS  functions. 
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for  the  continuation  closure  requires  that  the  environment  be  passed  in  rl  and  the  unit  result  of 
the  computation  in  r2. 

To  apply  a  closure  at  the  TAL  level,  we  first  use  the  vmpack  operation  to  open  the  existential 
package.  Then  the  code  and  the  environment  of  the  closure  pair  are  loaded  into  appropriate 
registers,  along  with  the  argument  to  the  function.  Finally,  we  use  a  jump  instruction  to  transfer 
control  to  the  closure’s  code. 

Figure  2  gives  the  CPS-based  TAL  code  for  the  following  ML  expression  which  computes  the 
factorial  of  6: 


let  fun  fact  n  = 

if  n  =  0  then  1 
else 

n  *  fact  (n-1) 
in 

fact  6 
end 


3  Stacks 


In  this  section,  we  show  how  to  extend  TAL  to  obtain  a  Stack- Based  Typed  Assembly  Language 
(STAL).  Figure  3  defines  the  new  syntactic  constructs  for  the  language.  In  what  follows,  we 
informally  discuss  the  dynamic  and  static  semantics  for  the  modified  language,  leaving  formal 
treatment  to  Appendix  A. 


3,1  Basic  Developments 

Operationally  we  model  stacks  (iS)  as  lists  of  word-sized  values.  There  are  four  new  instructions 
that  manipulate  the  stack:  The  salloc  n  instruction  enlarges  the  stack  by  n  words.  The  new 
stack  slots  are  uninitialized,  which  we  formalize  by  filling  them  with  nonsense  words  (ns).  On  a 
conventional  machine,  assuming  stacks  grow  toward  lower  addresses,  an  salloc  operation  would 
correspond  to  subtracting  n  from  the  stack  pointer.  The  sf ree  n  instruction  removes  the  top  n 
words  from  the  stack,  and  corresponds  to  adding  n  to  the  stack  pointer.  The  sld  r,  sp(*)  instruction 
loads  the  i***  word  (from  zero)  of  the  stack  into  register  r,  whereas  the  sst  sp(i),r  stores  register 
r  into  the  word. 

A  program  becomes  stuck  if  it  attempts  to  execute: 

•  sf  ree  n  and  the  stack  does  not  contain  at  least  n  words,  or 

•  sld  r,  sp(i)  or  sst  sp(«),  r  and  the  stack  does  not  contain  at  least  i  -|- 1  words. 

As  usual,  a  type  safety  theorem  (Theorem  A.l)  dictates  that  no  well- formed  program  can  become 
stuck. 
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(i/,  {},/)  where 

H  =  Ijfact: 

code[]{rl:(),r2:  inf,r3:rfc} . 
bneq  r2,l_nonzero 
unpack  [a,r3],r3 
Id  r4,r3(0) 

Id  rl,r3(l) 
mov  r2,l 
jmp  r4 
Ijionzero: 

code[]{rl:()  ,r2:2nf,r3:r/f}. 
sub  r4,r2,l 
malloc  TS[int^Tk] 

St  rB(0) ,r2 
St  r5(l) ,r3 

malloc  r3  [V[].{rl:(inf^ r^),r2:2i 
mov  r2,l^cont 
St  r3(0) ,r2 
St  r3(l) ,r6 
mov  r2,r4 

mov  rSfpack  ,  r^),r3]  <35  Tf^ 
jmp  l_fact 
l^cont : 

code[]{Tl:{int^ ,  T^) ,z2:int} . 

Id  r3,rl(0) 

Id  r4,rl(l) 
mul  r2,r3,r2 
unpack  [a,r4],r4 
Id  r3,r4(0) 

Id  rl,r4(l) 
jmp  r3 
IJialt: 

code[]{rl:(),r2:mf} . 
mov  rl,r2 
halt[znf] 

and  I  =  malloc  rl[] 
malloc  r2[] 

malloc  r3[V[].{rl:(),r2:2nf},  ()] 
mov  r4,l_halt 
St  r3(0) ,r4 
St  r3(l) ,t2 
mov  r2,6 

mov  rS, pack  [(),r3]  as 
jmp  l_fact 

and  T/c  =  3a.(V[].{rl:a,r2:mf}\ a^) 


y,  zero  branch:  call  k  (in  r3)  with  1 
y,  project  k  code 
y,  project  k  environment 

y,  jump  to  k 

•/•  n  -  1 

y,  create  environment  for  cont  in  rB 
y,  store  n  into  environment 
y,  store  k  into  environment 
i},  T^)]  y,  create  cont  closure  in  r3 

y,  store  cont  code 
y,  store  environment  (n^k) 

%  arg  :=  n  -  1 

y,  abstract  the  type  of  the  environment 
y,  recursive  call 

y.  r2  contains  (n  —  1)! 
y.  retrieve  n 
y,  retrieve  k 
y*  n  X  (n  —  1)! 
y,  unpack  k 
y,  project  k  code 
y*  project  k  environment 
y  jump  to  k 


y,  halt  with  result  in  rl 

y,  create  empty  environment  (()) 
y,  create  another  empty  environment 
y,  create  halt  closure  in  r3 

y,  store  cont  code 

y,  store  environment  (} 

y  load  argument  (6) 

y  abstract  the  type  of  the  environment 

y  begin  fact  with 

y  {rl  =  (),  r2  =  6,  r3  =  haltcont} 


Figure  2:  Typed  Assembly  Code  for  Factorial 
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types 

T  : 

•  •  •  1  ns 

stack  types 

(T 

:= 

p  1  nil  1  r::cr 

type  assignments 

A  : 

:= 

<1 

register  assignments 

r  : 

:= 

{ri:ri,...,r„:r„,sp:o-} 

word  values 

w 

:= 

•  •  •  1  wla]  1  ns 

small  values 

V  : 

register  files 

R  : 

:= 

{ri  !-)■  -iui, . . .,  1-^  Wn,sp>-^  S} 

stacks 

S  : 

:= 

nil  1  w::S 

instructions 

t  : 

•••  1  salloc  n  1  sfree  n  \  sld  rd,sp{i)  \  sst  sp(i) 

Figure  3:  Additions  to  TAL  for  Simple  Stacks 


Stacks  are  classified  by  stack  types  (cr),  which  include  nil  and  riia.  The  former  describes  the  empty 
stack  and  the  latter  describes  a  stack  of  the  form  w::S  where  w  has  type  r  and  S  has  type  a. 
Stack  types  also  include  stack  type  variables  (p),  which  may  be  used  to  abstract  the  tail  of  a  stack 
type.  The  ability  to  abstract  stack  types  is  critical  for  supporting  procedure  calls  and  is  discussed 
in  detail  later. 

As  before,  the  register  file  for  the  abstract  machine  is  described  by  a  register  file  type  (F)  mapping 
registers  to  types.  However,  F  also  maps  the  distinguished  register  sp  to  a  stack  type  a.  Finally, 
code  blocks  and  code  types  support  polymorphic  abstraction  over  both  types  and  stack  types.  In 
the  interest  of  clarity,  from  time  to  time  we  will  give  registers  names  (such  as  ra  or  re)  instead  of 
numbers. 

One  of  the  uses  of  the  stack  is  to  save  temporary  values  during  a  computation.  The  general 
problem  is  to  save  on  the  stack  n  registers,  say  ri  through  r„,  of  types  Ti  through  r„,  perform  some 
computation  e,  and  then  restore  the  temporary  values  to  their  respective  registers.  This  would  be 
accomplished  by  the  following  instruction  sequence  where  the  comments  (delimited  by  '/,)  show  the 
stack’s  type  at  each  step  of  the  computation. 


salloc  n 

7.  cr 

'/,  ns::ns:: 

-  • • wnswa 

sst  sp(0),ri 

*/,  ri::ns::- 

• ' wnswa 

sst  sp(n-l). 

rn 

•/.  ■ 

-WTnWa 

code  for  e 

•/.  • 

••WTnWa 

sld  ri,sp(0) 

•/.  ti::t2::- 

-WTnWa 

sld  r„,sp(n- 

1) 

'/.  Ti::t2::  ■ 

^•WTnWa 

sfree  n 

1,0 

If,  upon  entry,  has  type  n  and  the  stack  is  described  by  a,  and  if  the  code  for  c  leaves  the  state 
of  the  stack  unchanged,  then  this  code  sequence  is  well- typed.  Furthermore,  the  typing  discipline 
does  not  place  constraints  on  the  order  in  which  the  stores  or  loads  are  performed. 

It  is  straightforward  to  model  higher-level  primitives,  such  as  push  and  pop.  The  former  can  be 
seen  as  simply  salloc  1  followed  by  a  store  to  sp(0),  whereas  the  latter  is  a  load  from  sp(0)  followed 
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by  sfree  1.  Also,  a  “jump-and-link”  or  “call”  instruction  which  automatically  moves  the  return 
address  into  a  register  or  onto  the  stack  can  be  synthesized  from  our  primitives.  To  simplify  the 
presentation,  we  did  not  include  these  instructions  in  STAL;  a  practical  implementation,  however, 
would  need  a  full  set  of  instructions  appropriate  to  the  architecture. 


3.2  Stack  Polymorphism 

The  stack  is  commonly  used  to  save  the  current  return  address,  and  temporary  values  across 
procedure  calls.  Which  registers  to  save  and  in  what  order  is  usually  specified  by  a  compiler- 
specific  calling  convention.  Here  we  consider  a  simple  calling  convention  where  it  is  assumed  that 
there  is  one  integer  argument  and  one  unit  result,  both  of  which  are  passed  in  register  rl,  and 
that  the  return  address  is  passed  in  the  register  ra.  When  invoked,  a  procedure  may  choose  to 
place  temporaries  on  the  stack  as  shown  above,  but  when  it  jumps  to  the  return  address,  the  stack 
should  be  in  the  same  state  as  it  was  upon  entry.  Naively,  we  might  expect  the  code  for  a  function 
obeying  this  calling  convention  to  have  the  following  STAL  type: 

{rl:?nt,  sp:(T,  ra:{rl:(),  spio'}} 

Notice  that  the  type  of  the  return  address  is  constrained  so  that  the  stack  must  have  the  same 
shape  upon  return  as  it  had  upon  entry.  Hence,  if  the  procedure  pushes  any  arguments  onto  the 
stack,  it  must  pop  them  off. 

However,  this  typing  is  unsatisfactory  for  two  important  reasons: 


•  Nothing  prevents  the  function  from  popping  off  values  from  the  stack  and  then  pushing  new 
values  (of  the  appropriate  type)  onto  the  stack.  In  other  words,  the  caller’s  stack  frame  is  not 
protected  from  the  function’s  code. 

•  Such  a  function  can  only  be  invoked  from  states  w'here  the  entire  stack  is  described  exactly 
by  <T.  This  effectively  limits  invocation  of  the  procedure  to  a  single,  pre-determined  point  in 
the  execution  of  the  program.  For  example,  there  is  no  way  for  a  procedure  to  push  its  return 
address  onto  the  stack  and  to  jump  to  itself  (f.e.,  to  recurse). 


The  solution  to  both  problems  is  to  abstract  the  type  of  the  stack  using  a  stack  type  variable: 

V[p].{rl:m<,  sp:/9,  ra:{rl:mf ,  sp:/)}} 

To  invoke  a  function  having  this  type,  the  caller  must  instantiate  the  bound  stack  type  variable  p 
with  the  current  type  of  the  stack.  As  before,  the  function  can  only  jump  to  the  return  address 
when  the  stack  is  in  the  same  state  as  it  was  upon  entry. 

This  mechanism  addresses  the  first  problem  because  the  type  checker  treats  p  as  an  abstract  stack 
type  while  checking  the  body  of  the  code.  Hence,  the  code  cannot  perform  an  sfree,  sld,  or  sst 
on  the  stack.  It  must  first  allocate  its  own  space  on  the  stack,  only  this  space  may  be  accessed  by 
the  function,  and  the  space  must  be  freed  before  returning  to  the  caller.^ 

^Some  intuition  on  this  topic  may  be  obtained  from  Reynolds’s  theorem  on  parametric  polymorphism  [27]  but  a 
formed  proof  is  difficult. 
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(i7,  {sp  1-^  nz/},  /)  where 


Tp}. 

y,  if  n  =  0  continue 
y,  result  is  1 
y*  return 


ra  :  Tp} . 

y.  n  -  1 


H  =  l_fact: 

code[/o]{rl  :  (),  r2  :  int^  sp  :  /?,  ra  : 
bneq  r2,l_non2ero[/?] 
mov  rl,l 
jmp  ra 
1  .nonzero: 

code[/?]{rl  :  (),r2  :  int^  sp  :  p, 
sub  r3,r2,l 
salloc  2 
sst  sp(0),r2 
sst  sp(l),ra 
mov  r2,r3 
mov  ra,l.cont[p] 
jmp  lJf^LCt[int::Tp::p] 

l-cont : 

code[/)]{rl  :  int,  sp  :  int::Tp::p} 
sld  r2,sp(0) 
sld  ra,sp(l) 
sfree  2 
mul  rl,r2,rl 
jmp  ra 
IJialt: 

code[]{rl  :  mf,sp  :  nil}, 
halt[znf] 

and  I  =  malloc  rl[] 
mov  r2,6 
mov  ra,lJialt 
jmp  l.fact[nz7] 

and  Tp  =  V[].{rl :  int^  sp  :  p} 


y*  allocate  stack  space  for  n  and  the  return  address 
y,  save  n 

y*  save  return  address 


y,  recursive  call  to  fact  with  n  -  1, 
y  abstracting  saved  data  atop  the  stack 

y,  restore  n 

y  restore  return  address 

y  n  X  (n  —  1)! 
y  return 


y  create  empty  environment 
y  argument 

y  return  address  for  initial  call 


Figure  4:  STAL  Factorial  Example 


The  second  problem  is  also  solved  because  the  stack  type  variable  may  be  instantiated  in  multiple 
different  ways.  Hence  multiple  call  sites  with  different  stack  states,  including  recursive  calls,  may 
now  invoke  the  function.  In  fact,  a  recursive  call  will  usually  instantiate  the  stack  variable  with 
a  different  type  than  the  original  call  because,  unless  it  is  a  tail-call,  it  will  need  to  store  its  own 
return  address  on  the  stack. 

Figure  4  gives  stack-based  code  for  the  factorial  program.  The  function  is  invoked  by  moving  its 
environment  (an  empty  tuple,  since  factorial  has  no  free  variables)  into  rl,  the  argument  into  r2, 
and  the  return  address  label  into  ra  and  jumping  to  the  label  Ijfact.  Notice  that  the  nonzero 
branch  must  save  the  argument  and  current  return  address  on  the  stack  before  jumping  to  the 
fact  label  in  a  recursive  call.  In  so  doing,  the  code  must  use  stack  polymorphism  to  account  for 
its  additions  to  the  stack. 
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3.3  Calling  Conventions 


It  is  interesting  to  note  that  the  stack-based  code  is  quite  similar  to  the  heap-based  code  of  Figure  2. 
In  a  sense,  the  stack-based  code  remains  in  a  continuation-passing  style,  but  instead  of  passing  the 
continuation  as  a  heap-allocated  tuple,  the  environment  of  the  continuation  is  passed  in  the  stack 
pointer  and  the  code  of  the  continuation  is  passed  in  the  return  address  register.  To  more  fully 
appreciate  the  correspondence,  consider  the  type  of  the  TAL  version  of  Ijfact  from  Figure  2: 

{rl:(),  r2:mt,  ra:3Q'.({rl:a;,  r2:mf}^,  a^)} 

We  could  have  used  an  alternative  approach  where  the  continuation  closure  is  passed  unboxed  in 
separate  registers.  To  do  so,  the  function’s  type  must  perform  the  duty  of  abstracting  a,  since  the 
continuation’s  code  and  environment  must  each  still  refer  to  the  same  a: 

V[a].{rl:(),  r2:m^,ra:{rl;Q;,  r2:m<},  ra':a} 

Now  recall  the  type  of  the  corresponding  STAL  code: 

V[/)].{rl:(),  T2:int,  ra:{sp;/9,  sp:p} 

These  types  are  essentially  the  same!  Indeed,  the  only  difference  between  continuation-passing  ex¬ 
ecution  and  stack-based  execution  is  that  in  stack-based  execution  continuations  are  unboxed  and 
their  environments  are  allocated  on  the  stack.  This  connection  is  among  the  folklore  of  continuation¬ 
passing  compilers,  but  the  similarity  of  the  two  types  in  STAL  summarizes  the  connection  partic¬ 
ularly  succinctly. 

The  STAL  types  discussed  above  each  serve  the  purpose  of  formally  specifying  a  procedure  calling 
convention,  specifying  the  usage  of  the  registers  and  stack  on  entry  to  and  return  from  a  procedure. 
In  each  of  the  above  calling  conventions,  the  environment,  argument,  and  result  are  passed  in 
registers.  We  also  can  specify  that  the  environment,  argument,  return  address,  and  the  result  are 
all  passed  on  the  stack.  In  this  calling  convention,  the  factorial  function  has  type  (remember  that 
the  convention  for  the  result  is  given  by  the  type  of  the  return  address): 

V[/9].{sp :  {)::int::{s-p:int::p}::p} 


These  types  do  not  constrain  optimizations  that  respect  the  given  calling  conventions.  For  instance, 
tail-calls  can  be  eliminated  in  CPS  (the  first  two  conventions)  simply  by  forwarding  the  continuation 
to  the  next  function.  In  a  stack-based  system  (the  second  two),  the  type  system  similarly  allows 
us  (if  necessary)  to  pop  the  current  activation  frame  off  the  stack  and  to  push  arguments  before 
performing  the  tail-call.  Furthermore,  the  type  system  is  expressive  enough  to  type  this  resetting 
and  adjusting  for  any  kind  of  tail-call,  not  just  a  tail-call  to  self. 

Types  may  express  more  complex  conventions  as  well.  For  example,  callee-saves  registers  (registers 
whose  values  must  be  preserved  across  function  calls)  can  be  handled  in  the  same  fashion  as  the 
stack  pointer:  A  function’s  type  abstracts  the  type  of  the  callee-saves  register  and  provides  that 
the  register  have  the  same  type  upon  return.  For  instance,  if  we  wish  to  preserve  register  r3  across 
a  call  to  factorial,  we  would  use  the  type: 

V[/9, 0'].{rl:(),  r2:mf,  r3:a,  ra:{sp:p,  rl:m<,  r3:o},  sp:/)} 
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Alternatively,  with  boxed,  heap-allocated  closures,  we  would  use  the  type: 

V[Qf].{rl:(),r2  :  mf,  rSro:,  ra:3/?.({rl:y8,  r2:mt,  rSra}^,/?^)} 

This  is  the  type  that  corresponds  to  the  callee-saves  protocol  of  Appel  and  Shao  [1].  Again  the 
close  correspondence  holds  between  the  stack-  and  heap-oriented  types.  Indeed,  either  one  can  be 
obtained  mechanically  from  the  other.  Thus  this  correspondence  allows  transformations  developed 
for  heap-based  compilers  to  be  used  in  traditional  stack-based  compilers  and  vice  versa. 


4  Exceptions 


We  now  consider  how  to  implement  exceptions  in  STAL.  We  will  find  that  a  calling  convention 
for  function  calls  in  the  presence  of  exceptions  may  be  derived  from  the  heap-based  CPS  calling 
convention,  just  as  was  the  case  without  exceptions.  However,  implementing  this  calling  convention 
will  require  that  the  type  system  be  made  more  expressive  by  adding  compound  stack  types.  This 
additional  expressiveness  will  turn  out  to  have  uses  beyond  exceptions,  allowing  a  variety  of  sorts 
of  pointers  into  the  midst  of  the  stack. 


4.1  Exception  Calling  Conventions 

In  a  heap-based  CPS  framework,  exceptions  are  implemented  by  passing  two  continuations:  the 
usual  continuation  and  an  exception  continuation.  Code  raises  an  exception  by  jumping  to  the 
latter.  For  an  integer  to  unit  function,  this  calling  convention  is  expressed  as  the  following  TAL 
type  (ignoring  the  outer  closure  and  environment): 

{vliint,  ra:3Q;i.({rl:Q'i,  r2:()}\  oj),  re:3Q'2-({rl:a'2)  r2:ea;n}^,  02)} 

As  before,  the  caller  could  unbox  the  continuations: 

V[q;i,  a:2].{rl:m^,  ra:{rl:Q'i,  r2:()},  ra'  :o;i,  rG:{rl:a;2,  r2:ea;n},  re':a:2} 

Then  the  caller  might  (erroneously)  attempt  to  place  the  continuation  environments  on  stacks,  cis 
before: 

V[pi,  ra:{sp:/)i,  rl:()},  sp:/)i,  re:{sp:/92,  rl:ea;n},  sp':p2} 

Unfortunately,  this  calling  convention  uses  two  stack  pointers,  and  there  is  only  one  stack.  Observe, 
though,  that  the  exception  continuation’s  stack  is  necessarily  a  tail  of  the  ordinary  continuation’s 
stack.  This  observation  leads  to  the  following  calling  convention  for  exceptions  with  stacks: 

'’'[Pi)  P2]-{sp:pi  o  P2,  rl-.int,  ra:{sp:pi  o  p2,  rl:()}, 
re:{sp:p2>  rl:ea;n},  Tes:ptr{p2)} 

This  type  uses  the  notion  of  a  compound  stack:  When  (Ti  and  <T2  are  stack  types,  the  compound 
stack  type  ai  o  02  is  the  result  of  appending  the  two  types.  Thus,  in  the  above  type,  the  function  is 
presented  with  a  stack  with  type  pi  o  p2,  all  of  which  is  expected  by  the  regular  continuation,  but 
only  a  tail  of  which  (p2)  is  expected  by  the  exception  continuation.  Since  pi  and  p2  are  quantified, 
the  function  may  still  be  used  for  any  stack  so  long  as  the  exception  continuation  accepts  some  tail 
of  that  stack. 
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types 

T 

•  •  •  1  ptr{(7) 

stack  types 

a 

•  •  ■  1  (Ti  0  (72 

word  values 

w 

•  •  ■  1  ptr{i) 

instructions 

i  » » — 

•  •  <  1  mov  Vi,  sp  1  mov  sp,  r*  |  sld  r^,  r^(«)  |  sst  rd{i),ra 

Figure  5: 

Additions  to  TAL  for  Compound  Stacks 

To  raise  an  exception,  the  exception  is  placed  in  rl  and  control  is  transferred  to  the  exception 
continuation.  This  requires  cutting  the  actual  stack  down  to  just  that  expected  by  the  exception 
continuation.  Since  the  length  of  pi  is  unknown,  this  can  not  be  done  by  sfree.  Instead,  a  pointer 
to  the  desired  position  in  the  stack  is  supplied  in  res,  and  is  moved  into  sp.  The  type  ptr{a)  is  the 
type  of  pointers  into  the  stack  at  a  position  where  the  stack  has  type  a.  Such  pointers  are  obtained 
simply  by  moving  sp  into  a  register. 

4.2  Compound  Stacks 

The  additional  syntax  to  support  compound  stacks  is  summarized  in  Figure  5.  The  type  constructs 
(TiO(T2  and  ptr[a)  were  discussed  above.  The  word  value  ptr{i)  is  used  by  the  operational  semantics 
to  represent  pointers  into  the  stack;  the  element  pointed  to  is  i  words  from  the  bottom  of  the  stack. 
Of  course,  on  a  real  machine,  such  a  value  would  be  implemented  by  an  actual  pointer.  The 
instructions  mov  r^,  sp  and  mov  sp,  r*  save  and  restore  the  stack  pointer,  and  the  instructions 
sld  rd,rs{i)  and  sst  rd{i),rs  allow  for  loading  from  and  storing  to  pointers. 

The  introduction  of  pointers  into  the  stack  raises  a  delicate  issue  for  the  type  system.  When  the 
stack  pointer  is  copied  into  a  register,  changes  to  the  stack  are  not  reflected  in  the  type  of  the  copy 
and  can  invalidate  a  pointer.  Consider  the  following  incorrect  code; 

'/,  begin  with  sp  :  sp  i-f  w::S  (r  ^  ns) 

mov  rl,sp  '/,  rl  :  ptr{T::a) 

sfree  1  '/,  sp  :  ir,  sp  5 

salloc  1  y,  sp  :  ns::a,  sp  ns::S 

sld  r2,  rl(0)  V,  r2  :  r  but  r2  i->  ns 

When  execution  reaches  the  final  line,  rl  still  has  type  ptr{T::a),  but  this  type  is  no  longer  consistent 
with  the  state  of  the  stack;  the  pointer  in  rl  points  to  ns. 

To  prevent  erroneous  loads  of  this  sort,  the  type  system  requires  that  the  pointer  Vg  be  valid  when 
used  in  the  instructions  sld  r^,  rs(f),  sst  rd{i),  r*,  and  mov  sp,  r^.  An  invariant  of  the  type  system 
is  that  the  type  of  sp  always  describes  the  current  stack,  so  using  a  pointer  into  the  stack  will  be 
sound  if  that  pointer’s  type  is  consistent  with  sp’s  type.  Suppose  sp  has  type  <ti  and  r  has  type 
ptr{a2),  then  r  is  valid  if  <72  is  a  tail  of  ai  (formally,  if  there  exists  some  (t'  such  that  ai  =  a'  0(72). 
If  a  pointer  is  invalid,  it  may  be  neither  loaded  from  nor  moved  into  the  stack  pointer.  In  the  above 
example  the  load  is  rejected  because  rl’s  type  r:;<7  is  not  a  tail  of  sp’s  type,  ns::a. 
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4.3  Using  Compound  Stacks 

Recall  the  type  for  integer  to  unit  functions  in  the  presence  of  exceptions; 

V[/>i,P2]-{sp:pi  op2,rl;mt,ra;{sp:/)i  o/92,rl:()}, 
re:{sp:/!)2)  rl:ea;n},  res:ptr(p2)} 

An  exception  may  be  raised  within  the  body  of  such  a  function  by  restoring  the  handler’s  stack 
from  re'  and  jumping  to  the  handler.  A  new  exception  handler  may  be  installed  by  copying  the 
stack  pointer  to  re'  and  making  subsequent  function  calls  with  the  stack  type  variables  instantiated 
to  nil  and  pi  o  p2.  Calls  that  do  not  install  new  exception  handlers  would  attach  their  frames  to  pi 
and  pass  on  p2  unchanged. 

Since  exceptions  are  probably  raised  infrequently,  an  implementation  could  save  a  register  by  storing 
the  exception  continuation’s  code  pointer  on  the  stack,  instead  of  in  its  own  register.  If  this 
convention  were  used,  functions  would  expect  stacks  with  the  type  pi  o  (n,andier::p2)  and  exception 
pointers  with  the  type  ptr(rhancller"P2)  where  n,andler  =  V[].{sp:/92,  rl:ea;n}. 

This  last  convention  illustrates  a  use  for  compound  stacks  that  goes  beyond  implementing  excej)- 
tions.  We  have  a  general  tool  for  locating  data  of  type  r  amidst  the  stack  by  using  the  calling 
convention: 

V[pi,/)2]-{sp:pi  o  (r::/)2),  rl:ptr(r::p2),  •  •  •} 

One  application  of  this  tool  would  be  for  implementing  Pascal  with  displays.  The  primary  limitation 
of  this  tool  is  that  if  more  than  one  piece  of  data  is  stored  amidst  the  stack,  although  quantification 
may  be  used  to  avoid  specifying  the  precise  locations  of  that  data,  function  calling  conventions 
would  have  to  specify  in  what  order  data  appears  on  the  stack.  It  appears  that  this  limitation 
could  be  removed  by  introducing  a  limited  form  of  intersection  type,  to  allow  a  different  view  of 
the  stack  for  each  datum  located  on  the  stack,  but  we  have  not  explored  the  ramifications  of  this 
enhancement. 


5  Compiling  to  STAL 


We  make  the  discussion  of  the  preceding  chapters  concrete  by  presenting  a  formal  translation  that 
compiles  a  high-level  programming  language  with  integer  exceptions  into  STAL.  The  syntax  of 
the  source  language  appears  in  Figure  6.  The  static  semantics  of  the  source  language  is  given  two 
judgments,  a  type  formation  judgment  A  h  r  type  and  a  term  formation  judgment  A;  T  H  e  :  r. 
The  rules  for  the  former  are  completely  standard  and  are  omitted;  the  rules  for  the  latter  can  be 
obtained  by  dropping  the  translation  portion  C)  from  the  translating  rules  that  follow.  Closure 
conversion  [20,  23]  presents  no  interesting  issues  particular  to  this  translation,  so  in  the  interest 
of  simplicity,  we  assume  it  has  already  been  performed.  Consequently,  well-typed  function  terms 
[fix ,  a;„:r„):r.e)  must  be  closed. 

In  order  to  illustrate  use  of  the  stack,  the  translation  uses  a  simple  stack-oriented  strategy.  No 
register  allocation  is  performed;  all  arguments  and  most  temporaries  are  stored  on  the  stack.  Also, 
no  particular  effort  is  made  to  be  efficient. 

The  translation  of  source  types  to  STAL  types  is  given  below;  the  interest  case  is  the  calling 
convention  for  functions.  The  calling  convention  abstracts  a  set  of  type  variables  (A),  and  abstracts 
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types  T 

terms  e 

primitives  p 

type  contexts  A 

value  contexts  T 


a  I  int  I  V[A].(ri,...,r„)->r  I 
X  I  i  \  fix  x{xi:ti,  . . . ,  Xn:Tn):T.e  \  €162  \  e[r]  | 

I  TTi{e)  |  eipe2  |  *7<?(ei, 62, es) 

raise  [r]  e  |  try  ei  handle  x  =>  €2 

+  |-|x 

^1*^1  ?  •  •  •  > 


Figure  6:  Source  Syntax 


stack  type  variables  representing  the  front  (pi)  and  back  (/?2)  of  the  caller’s  stack.  The  front  of 
the  stack  consists  of  all  of  the  caller’s  stack  up  to  the  enclosing  exception  handler,  and  the  back 
consists  of  everything  behind  the  enclosing  exception  handler.  On  entry  to  a  function,  the  stack 
is  to  contain  the  function’s  arguments  on  top  of  the  caller’s  stack.  The  exception  register,  re, 
and  the  exception  stack  register,  res,  contain  pointers  to  the  enclosing  exception  handler  and  its 
stack,  respectively.  Finally,  the  return  address  register,  ra,  contains  a  return  pointer  that  expects 
the  result  value  in  rl,  the  same  stack  except  the  arguments  removed,  and  the  exception  registers 
unchanged.^ 

|a|  = 

\int\  = 

|(n,.-M-r„)i  = 

isp  :  op2j, 

ra  :  {rl:|r|,  sp-.pi  o  p2,  re:{rl:m<,  sp:/)2},  res:p«r(/)2)}, 
re  :  sp:p2}^ 

TBS  :  ptr{p2)} 


a 


int 

V[A,/9i,/!)2]- 


r> 


The  translation  of  source  terms  is  given  as  a  type-directed  translation  governed  by  the  judgment 
A;  r  h  e  :  r  C.  The  judgment  is  read  as  folloA\^s;  in  type  context  A  and  values  context  F, 
the  term  e  has  type  r  and  translates  to  a  STAL  code  sequence  C.  Without  the  translation  C, 
this  judgment  specifies  the  static  semantics  of  the  source  language.  Therefore  it  is  clear  that  any 
well-typed  source  term  is  compiled  by  this  translation. 

In  order. td  simplify  the  translation’s  presentation,  we  use  code  sequences  that  are  permitted  to 
contain  address  labels  after  jmp  and  halt  instructions: 

code  sequences  C  •\c]C  \  jmp  t;;  ^:code[A]r.(7  |  halt[r];  ^:codG[A]r.C 

These  code  sequences  are  appended  together  to  form  a  conglomerate  code  block  of  the  form 
I\£i\h\  \ . .  Such  a  block  is  converted  to  an  official  STAL  program  by  heap  allocating  all  but 

the  first  segment  of  instructions.  Also  in  the  interest  of  simplicity,  we  assume  that  all  labels  used 
in  the  translation  are  fresh,  and  we  use  push  and  pop  instructions  as  shorthand  for  the  appropriate 
allocate/store  and  load/free  sequences. 

^Note  that  this  type  does  not  protect  the  caller  from  modification  of  the  exception  register.  The  calling  convention 
could  be  rewritten  to  provide  this  protection,  but  we  have  not  done  so  as  it  would  significantly  complicate  the 
presentation. 
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Code  sequences  produced  by  the  translation  assume  the  following  preconditions:  If  A;  F  f-  e  :  r  ^ 
C,  then  C  has  free  type  variables  contained  in  A,  has  free  stack  type  variables  pi,  p2  and  ps,  and 
expects  a  register  file  with  type: 

{sp  :  P3  o  |r|  o  Pi  o /)2, 
fp  :pir(|r|opiop2), 
re  :  sp:p2}, 

res  :  ptr{p2)} 

As  discussed  above,  the  stack  contains  the  value  variables  (|r|)  in  front  of  the  the  caller’s  stack 
(pi  0^2)-  The  stack  type  |r|  specifying  the  argument  portion  of  the  stack  is  defined  by: 

|a;i:ri, . .  .,x„:r„|  =  |r„|:: •  •  •  ::|ri|::m7 

Upon  entry  to  C,  the  staek  also  contains  an  unknown  series  of  temporaries  specified  by  ps.  The 
variable  ps  is  free  in  C,  so  appropriate  substitutions  for  pz  allow  C  to  be  used  in  a  variety  of  different 
environments.  Since  the  number  of  temporaries  is  unknown,  C  also  expects  a  frame  pointer,  fp,  to 
point  past  them  to  the  variables.  As  usual,  the  exception  registers  point  to  the  enclosing  exception 
handler  and  its  stack.  At  the  end  of  C,  the  register  file  has  the  same  type,  with  the  addition  that 
rl  contains  the  term’s  result  value  of  type  |r|. 

With  these  preliminaries  established,  we  are  ready  to  present  the  translation’s  rules.  The  code  for  a 
variable  reference  simply  finds  the  value  at  an  appropriate  offset  from  the  frame  pointer  and  places 
it  in  rl: 

■7 — - TT - TTT  (0  <  f  <  n) 

A;  [Xn-i  :Tn_i , . . . ,  ajotro)  h  Xi  :  Ti  sld  r  1,  f p(j) 


A  simple  example  of  an  operation  that  stores  temporary  information  on  the  stack  is  arithmetic.  The 
translation  of  ei  p€2  computes  the  value  of  ei  (placing  it  in  rl),  then  pushes  it  onto  the  stack  and 
computes  the  value  of  €2-  During  the  second  computation,  there  is  an  additional  temporary  word 
(on  top  of  those  specified  by  pz),  so  in  that  second  computation  pz  is  instantiated  with  int::pz‘,  this 
indicates  that  the  number  of  temporaries  is  still  unknown,  but  is  one  word  more  than  externally. 
After  computing  the  value  of  62,  the  code  retrieves  the  first  value  from  the  stack  and  performs  the 
arithmetic  operation. 


A;  r  I-  Cl  :  int  Ci  A;  F  h  €2  :  int  C2 
A;  r  h  ei  p  62  :  int  ^  Ci 

push  rl 
C2[int::pz/ pz] 

pop  r2 

arithp  rl,  r2,  rl 


(arith+  =  add 
arith-  =  sub 
arithx  =  ">^1 


Function  calls  are  compiled  (Figure  7)  by  evaluating  the  function  and  each  of  the  arguments,  placing 
their  values  on  the  stack.  Then  the  function  pointer  is  retrieved,  the  frame  pointer  is  stored  on 
the  stack  (above  the  arguments),  a  return  address  is  loaded  into  ra,  and  the  call  is  made.  In  the 
call,  the  front  of  the  stack  (pi)  is  instantiated  according  to  the  current  stack,  which  then  contains 
the  current  caller’s  frame  pointer,  temporaries,  and  arguments,  in  addition  to  the  previous  caller’s 
stack.  The  exception  handler  is  unchanged  so  the  back  of  the  stack  (pz)  is  as  well. 

The  code  for  a  function  (Figure  8),  before  executing  the  code  for  the  body,  must  establish  the  body’s 
preconditions.  It  does  so  by  pushing  on  the  stack  the  recursion  pointer  (the  one  value  variable  that 
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A;  r  h  e  :  (ri , . . . ,  r„)  r  ^  C  A;  F  h  e,-  :  r,-  ^  Cj 

A;  r  h  e(ei, . .  .,e„)  :r-^ 

;;sp:/!)3o|r|opiop2 

C 

push  rl 

C\{Ti^n"Pz/ 

push  rl 


Cn[\Tn-\\"  •  •  •  ::|ri|::rfun::p3//93] 
push  rl 

;;  sp  :  |r„|;:  •  •  •  |ri|::rfun::p3  o  |r|  o  pi  o  P2 

sld  rl,sp(n)  ; ;  recover  call  address 
sst  sp(n),  fp  ; ;  save  frame  pointer 

;;  sp  :  |r„|::  •  •  •  |ri|::ptr(|r|  o  o  p2)::/>3  o  |r|  o  pi  o  p2 

mOV  ra,4eturn[A,/9i,/!>2] 

jmp  rl|>tr(|r|  o  o  p2)"pz  o  |r|  o  ^1,^2] 

^return  :  code[A,pi,p2]{rl  :  |r|, 

sp  :  ptr(|r|  o  pi  o  p2)"Pz  o  |r|  o  p^  o  p2, 
re  :  {rl:m/,  sp:/)2}, 
res  :  ptr{p2)}. 
pop  fp  ; ;  recover  frame  pointer 
(where  Tfun  =  |  (ri , . . . ,  r„)  r | 

=  V[/)i,p2]-{sp:  (|r„|::---::|ri|::pio/>2), 

ra  :  {rl:|r|,  sj>:pi  o  p2,  re:{rl:mt,  sp:/)2},  r&s:ptr{p2)}, 
re  :  sp:/)2}, 

res  :  ptr{p2)}) 

Figure  7:  Function  Call  Compilation 
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ahTj  type  a;  {xiiTi, . .  .,Xn:Tn,x:'i[a\{Ti, . .  .,Tn) t)  he  :  t C 
A]T\-  fix  a;[^(a;i:ri, . . . ,  a;„:r„):r.e  :  V[a](ri, . . . ,  r„)  -)■  r 
jmp  4kip[A,/0i,/!)2] 

4un  :  code[5,pi,/92]{sp  :  |r„|:: •  •  •  |ri|:;/)i  o p2, 

ra.  \  Tj-eturn? 

re  :  sp'/>2}5 

res  :  ptr{p2)}. 

mov  rl,4un 

push  rl  ; ;  add  recursion  address  to  context 

mov  fp,  sp  ; ;  create  frame  pointer 

push  ra  ; ;  save  return  address 

; ;  sp  ;  rreturn"|V[^(ri, . . . ,  r„)  -)•  r|::|r„|::  •  •  •  ::\Ti\::pi  o  p2 

; ;  fp  :  ptr(|V[^(ri, . . . ,  r„)  o  P2) 

C'[Treturn*'^*^/ Ps] 

pop  ra 

sf ree  ra  +  1 . 

jmp  ra 

4kip  :  code[A,  pi,  /92]{sp  :  ps  o  |r|  opio  p2, 
fp:ptr{\T\opiop2), 
re  :  {rl;mt,  sp:/)2}, 
res  :  ptr{p2)}. 

mov  rl,4un 

(where  Tretum  =  {rl:|'r|,  sp:/)i  o  p2,  re:{rl:mf,  sp:p2},  Tes:ptr{p2)}) 


Figure  8:  Function  Compilation 

is  not  an  argument),  saving  the  return  address,  and  creating  a  frame  pointer.  After  executing  the 
body,  it  retrieves  the  return  address,  frees  the  variables  (in  accordance  with  the  calling  convention), 
and  jumps  to  the  return  address. 

The  remaining  non-exception  constructs  are  dealt- with  in  a  straightforward  manner,  and  are  shown 
in  Figure  9.  To  raise  an  exception  is  also  straightforward.  After  computing  the  exception  packet 
(always  an  integer  in  this  language),  the  entire  front  of  the  stack  is  discarded  by  moving  the 
exception  stack  register  res  into  sp,  and  then  the  exception  handler  is  called.  Any  code  following 
the  raise  is  dead;  the  postconditions  (including  a  “result”  value  of  type  |r|)  are  established  by 
inserting  a  label  that  is  never  called: 

_ A  h  r  type  A;T  \-  e  :  int  C _ _ 

A;  r  h  raise  [r]  e  :  r  C 

mov  sp,res 
jmp  re 

4eadcode  :  code[A,  pi,  p2]{rl  :  |r|, 

sp  :  p3  o  |r|  o  Pi  o  p2, 
fp:ptr(|r|opiop2), 
re  :  {rl:mf,  sp:p2}, 
res  :  ptr{p2)}. 
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A;  r  h  e  :  int  mov  rl,  i 


A  h  r'  type  A;  T  l-  e  :  V[a',  A].(ri, . . . ,  r^)  -)■  r  C 

A;  r  h  e[r']  :  V[A]((ri, . . . ,  r„)  ^  r)[r7a]  -  C 

mov  rl,rl[|r'|] 


A;  r  h  e,  :  r,-  C,- 

A)  r  H  ^6i ,  .  .  .  ^  .  7l  1  '  '  *  1  "^n) 

push  rl 

Cn[|r„_i|::---;:|ri|::p3//[)3] 

push  rl 

;;sp:|r„|::---|ri|;:p3o|r|opiop2 
malloc  rl[|ri|,...,|r„|] 
pop  r2 

st  rl(n  —  1),  r2 


pop  r2 
St  rl(0),r2 


A;r  I-  e  :  (ri,  ■ .  .,rn)  C 

A;ri-7ri(e) 

Id  rl,  rl(*  -  1) 


(1  <  i  <  n) 


A;  r  h  ei  :  int  ^  Ci  A;  F  h  e2  :  r  C2  A;  F  1-  63  :  r  ^  C3 
A;F  f- j/0(€i,e2,e3)  :  r~»-  Ci 

bno q  r  1 ,  ^nonzero[ A 1  pi,  P2] 

6*2 

jmp  4kip[A,/)i,p2] 

^nonzero  :  code[A,/!>i,p2]{sp  :  P30  |r|  opi  o  p2, 

fp  :  p<r(|F|opi  0P2), 
re  :  {rliint^  sp:/92}, 
res  :  ptr{p2)}. 

C3 

jmp  4kip[A,pi,p2] 

4kip  :  code[A,/!)i,p2]{rl  :  |r|, 

sp:p3o|F|o/)iop2, 
fp:p<r{|F|opiop2), 
re  :  {rl:m<,  sp:/92}» 
res  :  ptr{p2)}. 

Figure  9:  Integer  literal,  Instantiation,  Tuple,  and  Branching  Compilation 
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The  code  for  exception  handling  (Figure  10)  is  long,  but  not  complicated.  First  the  old  exception 
registers  and  frame  pointer  are  saved  on  the  stack,  and  the  new  exception  handler  is  installed.  The 
precondition  for  translated  terms  requires  that  the  variables  be  in  front  of  the  handler’s  stack  so 
they  must  be  copied  to  the  top  of  the  stack  and  a  new  frame  pointer  must  be  created.^  The  body 
is  then  executed,  with  the  stack  type  variables  instantiated  so  that  the  temporaries  and  front  are 
empty,  and  the  back  consists  of  everything  except  the  copied  variables.  Either  the  body  is  finished 
successfully  or  control  is  transferred  to  the  exception  handler.  In  either  case  the  original  state  is 
restored,  and  if  an  exception  was  raised,  the  handler  executes  before  proceeding. 


The  remaining  rule  is  the  sole  rule  for  the  judgment  h  e  program  P.  This  judgment  states  that  e 
is  a  valid  program  in  the  source  language,  and  that  P  is  a  STAL  program  that  computes  the  value 
of  e.  The  rule  establishes  the  translation’s  precondition  by  installing  a  default  exception  handler 
and  creating  a  frame  pointer,  and  bundles  up  the  resulting  code  as  a  STAL  program: 


0;  0  h  e  :  ~i-  C 


f-  e  program  ■ 


where  |/;  £i:hi 


mo V  re ,  ^uncaught 

mov  res,  sp 
mov  fp,  sp 

C[nil,  nil,  nilfp^,  pi,  P2] 
halt[wt] 

^uncaught  code[]{sp:m7,  rl:m<}. 

halt[mt] 

.  . ;  £n''hn\  — 


{{ii  !->■  hi, . .  .,4  1-4  hn},  {sp  I-4-  m7},  7) 


Proposition  5.1  (Type  Correctness)  //he  program  P  then  I-  P. 


6  Related  and  Future  Work 


Our  work  is  partially  inspired  by  Reynolds  [26],  which  uses  functor  categories  to  “replace  contin¬ 
uations  by  instruction  sequences  and  store  shapes  by  descriptions  of  the  structure  of  the  run-time 
stack.”  However,  Reynolds  was  primarily  concerned  with  using  functors  to  express  an  intermediate 
language  of  a  semantics-based  compiler  for  Algol,  whereas  we  are  primarily  concerned  with  type 
structure  for  general-purpose  target  languages. 

Stata  and  Abadi  [30]  formalize  the  Java  bytecode  verifier’s  treatment  of  subroutines  by  giving  a 
type  system  for  a  subset  of  the  Java  Virtual  Machine  language  [19].  In  particular,  their  type  system 
ensures  that  for  any  program  control  point,  the  Java  stack  is  of  the  same  size  each  time  that  control 
point  is  reached  during  execution.  Consequently,  procedure  call  must  be  a  primitive  construct 
(which  it  is  in  the  Java  Virtual  Machine).  In  contrast,  our  treatment  supports  polymorphic  stack 
recursion,  and  hence  procedure  calls  can  be  encoded  using  existing  assembly-language  primitives. 

More  recently,  O’Callahan  [24]  has  used  the  mechanisms  in  this  paper  to  devise  an  alternative, 
simpler  type  system  for  Java  bytecodes  that  differs  from  the  Java  bytecode  verifier’s  discipline  [19]. 

^This  is  an  example  of  when  it  is  inconvenient  that  stack  types  specify  the  order  in  which  data  appear  on  the 
stack.  In  fact,  this  inefficiency  can  be  removed  using  a  more  complicated  precondition,  but  in  the  interest  of  clarity 
we  have  not  done  so. 
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A;  r  h  e  :  r  C  A;  F,  x:int  \r  e'  :t  C 
A;  r  h  try  e  handle  x  e' :  r 

push  res  ; ;  save  old  handler  and  frame  pointer 
push  re 
push  f  p 

;  ;  Sp  :  O'handler 

; ;  install  new  handler 

mov  res,  sp  ; ;  res  :  ptr((Thandler) 

mov  re,  4andle[A,  pu  P2]  ; ;  re  :  {rl:m<,  sprahandler} 

; ;  to  fit  convention,  copy  arguments  below  the  new  handler’s  stack 
sld  rl,fp(n  -  1) 
push  rl 


sld  rl,fp(0) 
push  rl 

!  1  ^  <^handler 

mov  f  p,  sp  ; ;  create  new  frame  pointer 
;;  fp  :p/r(|r|oahandler) 

C[nil^  nil^  ^handler/z^s?  Pi)  p2^ 

sf  ree  n  ; ;  free  copied  arguments 

pop  fp  ; ;  restore  old  handler  and  frame  pointer 

pop  re 

pop  res 

jmp  4kip[A,/)i,p2] 

4andle  =  COde[A,  pi , /!)2]{rl  :  mt,sp  :  (Thandler} 

pop  fp  ; ;  restore  old  handler  and  frame  pointer 

pop  re 

pop  res  ^ 

;;  sp:/03o|r|o/9io/)2 

c 

jmp  4kip[A,/0i,/92] 

4kip  :  code[A,/!)i,/92]{rl  :  |r|, 

sp:/93o|r|o/)iop2, 
fp  :  jo<r(|r|o/)i  0P2), 
re  :  {rl;m(,  sp:/!)2}) 
res  :  ptr{p2)]. 

(where  ^handler  =  ptr{\V\o  pio  p2)::{rl:int,sy.p2}::ptr[p2)::pzo\T\o  pio  p2 
n  =  sizeof{T)) 

Figure  10:  Exception  Handler  Compilation 
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By  permitting  polymorphic  typing  of  subroutines,  O ’Callahan’s  type  system  accepts  strictly  more 
programs  while  preserving  safety.  This  type  system  sheds  light  on  which  of  the  verifier’s  restrictions 
are  essential  and  which  are  not. 

Tofte  and  others  [8,  33]  have  developed  an  allocation  strategy  using  “regions.”  Regions  are  lexically 
scoped  containers  that  have  a  LIFO  ordering  on  their  lifetimes,  much  like  the  values  on  a  stack.  As 
in  our  approach,  polymorphic  recursion  on  abstracted  region  variables  plays  a  critical  role.  However, 
unlike  the  objects  in  our  stacks,  regions  are  variable-sized,  and  objects  need  not  be  allocated  into 
the  region  which  was  most  recently  created.  Furthermore,  there  is  only  one  allocation  mechanism 
in  Tofte’s  system  (the  stack  of  regions)  and  no  need  for  a  garbage  collector.  In  contrast,  STAL  only 
allows  allocation  at  the  top  of  the  stack  and  assumes  a  garbage  collector  for  heap-allocated  values. 
However,  the  type  system  for  STAL  is  considerably  simpler  than  the  type  system  of  Tofte  et  ah,  as 
it  requires  no  effect  information  in  types. 

Bailey  and  Davidson  [6]  also  describe  a  specification  language  for  modeling  procedure  calling  con¬ 
ventions  and  checking  that  implementations  respect  these  conventions.  They  are  able  to  specify 
features  such  as  a  variable  number  of  arguments  that  our  formalism  does  not  address.  However, 
their  model  is  explicitly  tied  to  a  stack-based  calling  convention  and  does  not  address  features  such 
as  exception  handlers.  Furthermore,  their  approach  does  not  integrate  the  specification  of  calling 
conventions  with  a  general-purpose  type  system. 

Although  our  type  system  is  sufficiently  expressive  for  compilation  of  a  number  of  source  languages, 
it  has  several  limitations.  First,  it  cannot  support  general  pointers  into  the  stack  because  of  the 
ordering  requirements;  nor  can  stack  and  heap  pointers  be  unified  so  that  a  function  taking  a  tuple 
argument  can  be  passed  either  a  heap-allocated  or  a  stack-allocated  tuple.  Second,  threads  and 
advanced  mechanisms  for  implementing  first-class  continuations  such  as  the  work  by  Hieb  et  al.  [14] 
cannot  be  modeled  in  this  system  without  adding  new  primitives. 

Nevertheless,  we  claim  that  the  framework  presented  here  is  a  practical  approach  to  compilation. 
To  substantiate  this  claim,  we  are  constructing  a  compiler  called  TALC  that  compiles  ML  to  a 
variant  of  STAL  described  here,  suitably  adapted  for  the  32-bit  Intel  architecture.  We  have  found 
it  straightforward  to  enrich  the  target  language  type  system  to  include  support  for  other  type 
constructors,  such  as  references,  higher-order  constructors,  and  recursive  types.  The  compiler  uses 
an  unboxed  stack  allocation  style  of  continuation  passing,  as  discussed  in  this  paper. 

Although  we  have  discussed  mechanisms  for  typing  stacks  at  the  assembly  language  level,  our 
techniques  generalize  to  other  languages.  The  same  mechanisms,  including  polymorphic  recursion 
to  abstract  the  tail  of  a  stack,  can  be  used  to  introduce  explicit  stacks  in  higher  level  calculi.  An 
intermediate  language  with  explicit  stacks  would  allow  control  over  allocation  at  a  point  where 
more  information  is  available  to  guide  allocation  decisions. 


7  Summary 


We  have  given  a  type  system  for  a  typed  assembly  language  with  both  a  heap  and  a  stack.  Our 
language  is  flexible  enough  to  support  the  following  compilation  techniques:  CPS  using  either 
heap  or  stack  allocation,  a  variety  of  procedure  calling  conventions,  displays,  exceptions,  tail  call 
elimination,  and  callee-saves  registers. 
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A  key  contribution  of  the  type  system  is  that  it  makes  procedure  calling  conventions  explicit  and 
provides  a  means  of  specifying  and  checking  calling  conventions  that  is  grounded  in  language  theory. 
The  type  system  also  makes  clear  the  relationship  between  heap  allocation  and  stack  allocation  of 
continuation  closures,  capturing  both  allocation  strategies  in  one  calculus. 
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A  Formal  STAL  Semantics 


This  appendix  contains  a  complete  technical  description  of  our  calculus,  STAL.  The  STAL  abstract 
machine  is  very  similar  to  the  TAL  abstract  machine  (described  in  detail  in  Morrisett  et  al.  [23]). 
The  syntax  appears  in  Figure  11.  The  operational  semantics  is  given  as  a  deterministic  rewriting 
system  in  Figure  12.  The  notation  a\b/c\  denotes  capture  avoiding  substitution  of  b  for  c  in  a.  The 
notation' a{6  c},  where  a  is  a  mapping,  represents  map  update. 


To  make  the  presentation  simpler  for  the  branching  rules,  some  extra  notation  is  used  for  expressing 
sequences  of  type  and  stack  type  instantiations.  We  use  a  new  syntactic  class  (V’)  of  type  sequences: 

Ip  ::=  •\T,ip\cr,tl^ 


The  notation  ^u[^]  stands  for  the  natural  iteration  of  instantiations,  and  the  substitution  notation 
I[ip/A]  is  defined  to  mean: 


/[./.]  = 
I[T,'ip/a,A]  = 
I[(T,tp/p,A]  = 


23 


/ 

I[T/a][tp/A] 

iW/p]bP/^] 


The  static  semantics  is  similar  to  TAL’s  but  requires  extra  judgments  for  definitional  equal¬ 
ity  of  various  forms  of  type.  Definitional  equality  is  needed  because  two  stack  types  (such  as 
o  {int::nil)  and  int::int::nil)  may  be  syntactically  different  but  represent  the  same  type. 
The  judgments  are  summarized  in  Figure  13,  the  rules  for  type  judgments  appear  in  Figure  14,  and 
the  rules  for  term  judgments  appear  in  Figures  15  and  16. 

The  principal  theorem  regarding  the  semantics  is  type  safety: 


Theorem  A.l  (Type  Safety)  If  \-  P  and  P  \ — y*  P'  then  P'  is  not  stuck. 


The  theorem  is  proved  using  the  usual  Subject  Reduction  and  Progress  lemmas,  each  of  which  are 
proved  by  induction  on  typing  derivations. 

Lemma  A*2  (Subject  Reduction)  If  h  P  and  P  \ — >  P'  then  h  P'. 

Lemma  A.3  (Progress)  7/  f-  P  then  either  P  has  the  form  (7f,  P{rl  i->  it;},halt[r])  or  there 
exists  P'  such  that  P  i — y  P'. 
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types 

T 

a  1  int  1  ns  \  V[A].r  |  (rf  ‘ , . . . ,  |  Ba.r 

ptr{a) 

stack  types 

a 

:= 

p  1  nil  1  Twa  1  (Ti  o  (72 

initialization  flags 

0  1  1 

label  assignments 

{^1  ’^1 »  •  •  • ) 

type  assignments 

A 

•  1  a,  A  1  /),  A 

register  assignments 

r 

:= 

registers 

r 

— 

!--»• 

to 

word  values 

w 

= 

^  1  t  1  ns  1  ?r  1  w[r]  |  w[o-]  \  pack  [r,  w]  as  t' 

1  Ptr{i) 

small  values 

V 

= 

r  1  u;  1  t;[r]  |  v[a]  |  pack  [r,  u]  as  t' 

heap  values 

h 

{wi,...,  Wn)  1  code[A]r./ 

heaps 

H 

"{■^l  *  ^  )  •  •  •  j  ^ 

register  files 

R  : 

{ri  1-4  Wi, . . r„  i->  Wn,  sp  I-4-  5} 

stacks 

S  : 

nil  1  w::S 

instructions 

i  : 

aop  Td,  Tg,  V  1  bop  r,  V  1  Id  r^,  rs{i)  \  malloc  r[f\  \ 
mov  Td,  V  1  mov  sp,  r*  |  mov  r^,  sp  |  salloc  n  | 
sfree  n  |  sld  rd,sp{i)  |  sld  rd,rs{i)  \ 
sst  sp(i),rs  1  sst  rd(i),rs  |  st  rd{i),rs  \ 
unpack  [a,r^,v 

arithmetic  ops 

aop 

= 

a^d  1  sub  1  mul 

branch  ops 

bop 

beq  1  bneq  |  bgt  |  bit  |  bgte  |  bite 

instruction  sequences 

I 

i;  /  1  jmp  V  1  halt[r] 

programs 

P 

{H,R,I) 

Figure  11:  Syntax  of  STAL 
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[H,  R,  I)  1 — >  P  where 

if /  = 

then  P  = 

add  rd,rs,v]I' 

{H,R{rd^  R{rs)  +  Riv)},r) 
and  similarly  for  mul  and  sub 

beq  r,  v,  I' 

when  /?(r)  0 

{H,R,I') 

and  similarly  for  bneq,  bit,  etc. 

beq  r,  v\  I' 

when  R[r)  —  0 

{H,R,ry,/A]) 

where  R{v)  =  £[ip]  and  H{£)  =  code[A]r.7" 
and  similarly  for  bneq,  bit,  etc. 

jmp  i; 

{H,R,r[^p/A]) 

where  R{v)  =  and  H{£)  =  code[A]r./' 

Id  rd,rs{i);r 

{H,R{rd^Wi},r) 

where  R{rs)  =  £  and  H{£)  =  {wq,  . . . ,  «>n-i)  and  0  <  *  <  n 

malloc  rrf[ri,...,r„];/' 

(77{£^(?Ti,...,?r„)},i?{rd^^},/'j 
where  £  ^  H 

mov  rd,  u;  V 

{H,R{rd^R{v)},I') 

mov  rrf,  sp;  I' 

{H,R{rd^ptri\S\)},I') 

mov  sp,  r,;  I' 

{H,  i?{sp  !->■  Wj::  •  •  •  ,  I') 

where  72(sp)  =  u;„:: •  •  •  ::wi\:nil 
and  R{rs)  =  ptr{j)  with  0  <  j  <  n 

salloc  n;  V 

{H,  i?{sp  ns::  •  •  •  ::n5  ::fi(sp)},  /') 

n 

sf  ree  n;  /' 

iH,R{sp^  s},r) 

where  i?(sp)  =  Wj :;  •  •  •  ::rr„::5 

sld  rd,  sp(i);/' 

{H,R{rd^Wi},r) 

where  i2(sp)  =  •  •  •  ::Wn-iy-nil  and  0  <  i  <  n 

sld  rd,rs{i)-,r 

{H,  R{rd>-^  Wj..i},r) 

where  R{rs)  =  ptr{j)  and  jR(sp)  =  •  •  ■::wi::nil 

and  0  <  i  <  j  <  n 

sst  sp(i),rs;/' 

{H,  7?{sp  i->  U7o::  •  •  •  ::Wj_i  ::R{rs)::S},  /') 
where  R(sp)  =  wq::  ■  ■  -’.’.Wi'.'.S  and  0  <  i 

sst  rd{i),rs;r 

{H,  i?{sp  1-^  Wni:  •  •  •  ::wj^i^i::R{rs)::Wj^i^i::  •  •  •  /') 

where  R{rd)  =  ptr{j)  and  R{sp)  =  "V,W\:\nil 

and  0  <  z  <  J  <  n 

St. rd{t),rs;r 

{H{£  )-)•  (wo, . . . ,  Wi-u  R{rs),  Wi+i,  ■■  ■,  w„-i)},  R,  V) 
where  R{rd)  =  £  and  H{£)  =  (tuo, . . .,  tu„_i)  and  0  <  j  <  n 

unpack  [o,  rd],u;/' 

{H,  R{rd  i-t  w}J'[t/oi]) 
where  R{v)  =  pack  [r,  u)]  as  t' 

Where  R{v)  = 


R{r)  when  v  =  r 

w  when  v  =  w 

^(t;')[r]  when  v  =  v'[t] 

pack  [r,  R{v')]  as  t'  when  v  =  pack  [r,  v']  as  t' 


Figure  12:  Operational  Semantics  of  STAL 
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Judgement 

Meaning 

A  h  r 

r  is  a  valid  type 

Ahcr 

(7  is  a  valid  stack  type 

h  in 

'S'  is  a  valid  heap  type 

(no  context  is  used  because  heap  types  must  be  closed) 

Ahr 

r  is  a  valid  register  file  type 

A  \-  Ti  =  T2 

Ti  and  T2  are  equal  types 

A  h  (Ti  =  (72 

(Ti  and  (72  are  equal  stack  types 

A  h  Fi  =  Fa 

Fi  and  r2  are  equal  register  file  types 

A  h  Ti  <  r2 

Ti  is  a  subtype  of  T2 

A  f-  Fi  <  F2 

Fi  is  a  register  file  subtype  of  F2 

the  heap  H  has  type  ^ 

■9\~S:a 

the  stack  S  has  type  a 

R-.r 

the  register  file  R  has  type  F 

h  :  T  hval 

the  heap  value  h  has  type  r 

$ ;  A  h  tc  :  r  wval 

the  word  value  w  has  type  r 

’J';  A  I-  W7  : 

the  word  value  w  has  flagged  type 

(i.e,,  w  has  type  r  or  u;  is  ?r  and  (f  is  0) 

A;  F  F  u  :  r 

the  small  value  v  has  type  r 

A;  F  h  i  =j>  A';  F' 

instruction  t  requires  a  context  of  type  A;F 

and  produces  a  context  of  type  A';  F' 

I  is  a  valid  sequence  of  instructions 

\-P 

P  is  a  valid  program 

Figure  13:  Static  Semantics  of  STAL  (judgments) 


Ahr  AhcT  Ahr 


A  h  r  =  r 
Ahr 


A  h  Q-  =  cr 
Ah<T 


h  Ti 


h  {^1  1-^  Ti, . .  r„} 


Ahr  =  r 
Ahr 


Ahri  =  r2  Ah(Ti  =  cT2  Ahri  =  r2 


A  h  r2  =  Ti  A  h  Ti  =  r2  A  h  r2  =  ts 
A  h  Tj  =  72  A  h  Ti  =  73 

A  h  (72  =  (71  A  h  C7i  =  £72  A  h  (72  =  £73 
A  h  £7i  =  £72  A  h  (7i  =  £73 

A  h  a  =  a  ^  A  h  =  int 

A',Ahri  =  r2  A  hr,- =  7/ 


A  h  V[A'].ri  =  V[A'].r2 

Ct,  A  h  7i  =  72 


Ah(7r,...,7„^")  =  (7r,...,rr") 

A  h  £7i  =  (72 


A  h  3q.7i  =  9q'.72  A\-  ns  =  ns  Ah  ptr{ai)  =  ptr{a2) 


A  h  7i  =  72  A  h  (7i  =  £72 
A  h  7i::(7i  =  72::(72 

A  h  £7 


A  h  nil  =  nil 

A  h  (7i  =  £7j  A  h  £72  =  £72 
A  h  £7i  O  £72  =  £7j  O  £72 
A  h  £7 


A  h  ni7  O  £7  =  £7  A  h  £7  O  n«7  =  £7 
Ahr  Ah£7i  Ah  £72 
A  h  o  £72  =  7::((7i  o  £72) 

A  h  (7i  A  h  £72  A  h  £73 

A  h  (£7i  O  £72)  O  (73  =  £7i  O  ((72  O  £73) 

A  h  £7  =  £7'  A  h  7,-  =  7/ 


A  h  {sp:£7,ri  1-4  7i,...,r„  1-4  7„}  =  {sp:£7',  ri;7(, . . . ,  r„:7' } 


Ah7i<72  Ahri<r2 


A  h  7i  =  72  A  h  7i  <  72  A  h  72  <  73 


A  h  7i  <  72 


A  h  7i  <  73 


A  h  Ti 


A  |_  /-VI  _Vi-l  1  ¥>.+  1  <  /_¥■!  p  ^W  +  1 

A  h  £7  =  £7'  A  h  7,-  =  7/  (for  1  <  *  <  n)  Ahr,  (for  n  <  i  <  m) 


A  h  {sp:£7,ri:7i,...,r,„;7„,}  <  {sp:(7',  riirf, . . r„:7'} 


(m  >  n) 


Figure  14:  Static  Semantics  of  STAL,  Judgments  for  Types 
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hP  \-H:^  $h5:a  ®  t- P  :  F 


«'i-P:r 

h{H,R,I) 


l~  ^  '9  \-  hj  :  Tj  hval 

h  {4  I-4-  . .  .,4  1-^  hn}  : 


(t  =  {4:ri,...,4:r„}) 


_  •  I-  tt;  :  T  wval  $  h  5  :  (t 

\P  h  nil :  nil  ^  I-  w::S  :  r::<T 

'J'  h  5  :  (T  ^\-\r  Wi'.Ti  wval  (for  \  <  i  <  n) 

$  f-  {sp  (-4-  5,ri  1-4  1-4  Wm)  :  {sp:cT,  ri:ri, . . .,  r„:r„}  ~ 

'^hhiThval  'J';  A  h  tt;  :  r  wval  ^;A\-w:t‘^  \P;A;ri-i;:r 


_ Wj  :rf _  Ahr  ^;A;ri-/ 

'J'  I-  (wi, . . . ,  Wn)  :  , . . . ,  T^”)  hval  ’I'  h  code[A]r./  :  V[A].r  hval 

— ^  ^1  -  ^2 —  _  N  _ 

'^;AI-£:r2  wval  ^  A  h  « :  wval 

A  Hr  A  h  tt; :  V[q',  A^.r  wval  Aha  A  h  w  :  V|jo,  A'J.F  wval 

A  h  tt;[r]  :  V[A'].F[r/Q']  wval  'J';  A  h  u;[o-]  ;  V[A'].F[o-/p]  wval 

Ahr  A  h  t/;  :  r'[r/Q;]  wval 

^ ;  A  h  pack  [r,  w]  as  Ba.r'  :  3oi.t'  wval  ’J';  A  h  ns  :  ns  wval 

_ A  h  <7 _  /I  I  _  -v  Ahr 

’i’;  A  h  ptr{i)  :  ptr[cr)  wval  *  ’J’;  A  h  ?r  :  r° 

’g';  A  h  w  :  r  wval  _  CrM  -  'i  A  h  w  :  r  wval 

'9-,A\-w:t'^  A;  F  h  r  :  r  A;  F  h  w  :  r 

Ahr  ^;A;Fht;  :  V[a,A'].F'  Aha  A;  F  h  n  :  V[p,  A'j.F' 

;  A;  F  h  n[r]  :  V[A'].F'[r/a]  A;  F  h  v[a]  :  V[A'].F'[a/p] 

Ahr  ^;A;Fhn:P[r/a] 

A;  F  h  pack  [r,  n]  as  3a.r'  :  3a.r' 

•  h  ri  =  r2  ^  h  h  :  r2  hval  A  h  ri  =  r2  A  h  n;  :  r2  wval 
h  /i  :  ri  hval  A  h  w  :  ri  wval 

A  h  ri  =  r2  A;  F  h  n  :  r2 
'i';  A;  F  h  n  :  ri 

$;A;Fh/ 


^;A;Fh^=^  A^;r  ^;A';F^h/  A  h  Fi  <  F2  A; Fi  h  n  :  V[].F2 

^;A;Fhi;/  ^;A;Fihjmpn 

Ahr  A;  F  h  rl  :  r 
«’;A;Fhhalt[r] 


Figure  15:  STAL  Static  Semantics,  Term  Constructs  except  Instructions 
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A;  r  h  i  A';  r' 


A;  r  h  :  int  A;  F  h  :  int 
'f;A;ri-aop  r^,  r^,  t;  =>  A;  Flr^rmf} 

^;A;ril-r:w/  A;  Fi  h  :  V[].r2  A  h  Fi  <  Fa 
A;  Fi  I-  bop  r,  t;  A;  Fi 

^;A;Fhr,:(ro^°,...,Cr’> 


'i';A;FI-ld  r,;,  r4(j)  =>  A;  Flr^rri} 

A  h  Ti 


{(fi  =  1  A  0  <  i  <  n) 


4';  A;F  l-malloc  r[ri,...,r„]  =»  A;  F{r:(ri°, . . .,  r°)} 

_ A;F  h  t;  :  r _ 

A;  F  I-  mov  r^,  v  A;  F{rd:r} 

A;Fl-mov  rd,sp  A;F{rrf:p<r((T)} 

^I';A;Ff-  r^  :p<r((r2)  A  h  ai  =  03  o  02  )  =  a  ) 

A;  F  F  mov  sp,  A;  F{sp:(T2} 

'J';  A;  F  h  salloc  n  A;  F{sp:  /is::  •  •  •::ns^::<T}  ^  ^  ^ 


Ahffi  =  ro::---::rn-i::<r2 
A;  F  h  sf  ree  n  A;  F {sp:(r2} 
A  F  £Ti  =  To::  •  •  •::r,;;(T2 


(F(sp)  =  ai) 


W;  A;FF  sld  rrf,sp(«)  =>  A;F{rrf:r,} 


(F(sp)  =  CTi  A  0  <  i) 


A;  F  F  r*  :  p<r (0-3)  A  F  (Tj  =  (T2  o  <73 
A  F  CT3  =  To::  •  •  •::r,::<T4 

W;  A;  F  F  sld  rj,  rs{i)  A;  F{rd;r,} 

A  F  (Ti  =  To"  •  •  •  :;r,::(T2  5^;  A;  F  F  :  r 


A;  F  F  sst  sp(i),  r*  A;  F{sp:ro::  •  •  •::ri_i::r::(72} 

A;F  F  fd  :  ptr(<T3)  ’f;A;FFrs:r 
A  F  <Ti  =  (72  o  (T3  A  F  (73  =  To::  •  •  •  ::Ti::a4 
A  F  (75  =  To"  •  •  •  ::ri_i::r:;(74 
W;  A;  F  F  sst  Tj  A;  F{sp:(72  o  05,  rd:ptr{a5)} 
A;  F  F  rrf  :  (C, .  • Cr‘>  A;  F  F  r,  :  r,- 


(F(sp)  =  (7i  A  0  <  i) 

(F(sp)  =  C7i  A  0  <  i) 


(F(sp)  =  (7i  A  0  <  *) 


^ ;  A;  F  F  st  rrf(i),  r,  A;  F{rrf:(C . ,  r/ ,  ' , . . . ,  Cr' )) 

5^;  A;  r  h  tJ  :  3a. r 


—  (0  <  t  <  n) 


A;  F  F  unpack  [a,  rj],  t>  o,  A;  F{rd:r} 


(O'  ^  A) 


Figure  16:  STAL  Static  Semantics,  Instructions 
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